Body segmentation is an important step in many computer vision problems involving human images and one of the key components that affects the performance of all downstream tasks. Several prior works have approached this problem using a multi-task model that exploits correlations between different tasks to improve segmentation performance. Based on the success of such solutions, we present in this paper a novel multi-task model for human segmentation/parsing that involves three tasks, i.e., (i) keypoint-based skeleton estimation, (ii) dense pose prediction, and (iii) human-body segmentation. The main idea behind the proposed Segmentation--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks. SPD is based on a shared deep neural network backbone that branches off into three task-specific model heads and is learned using a multi-task optimization objective. The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model. Comprehensive ablation studies are also presented. Our experimental results show that the proposed multi-task (segmentation) model is highly competitive and that the introduction of additional tasks contributes towards a higher overall segmentation performance.
translated by 谷歌翻译
Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.
translated by 谷歌翻译
Recent state-of-the-art face recognition (FR) approaches have achieved impressive performance, yet unconstrained face recognition still represents an open problem. Face image quality assessment (FIQA) approaches aim to estimate the quality of the input samples that can help provide information on the confidence of the recognition decision and eventually lead to improved results in challenging scenarios. While much progress has been made in face image quality assessment in recent years, computing reliable quality scores for diverse facial images and FR models remains challenging. In this paper, we propose a novel approach to face image quality assessment, called FaceQAN, that is based on adversarial examples and relies on the analysis of adversarial noise which can be calculated with any FR model learned by using some form of gradient descent. As such, the proposed approach is the first to link image quality to adversarial attacks. Comprehensive (cross-model as well as model-specific) experiments are conducted with four benchmark datasets, i.e., LFW, CFP-FP, XQLFW and IJB-C, four FR models, i.e., CosFace, ArcFace, CurricularFace and ElasticFace, and in comparison to seven state-of-the-art FIQA methods to demonstrate the performance of FaceQAN. Experimental results show that FaceQAN achieves competitive results, while exhibiting several desirable characteristics.
translated by 谷歌翻译
我们介绍了一种新方法,使用一组体积原语(即超Quadrics)重建3D对象。该方法层次结构将目标3D对象分解为对成对的超季度,从而恢复了更细致的细节。尽管以前已经研究过这种层次结构方法,但我们仅使用预测的超质学属性引入了一种新的方法来分裂对象空间。该方法在Shapenet数据集上进行了训练和评估。我们的实验结果表明,可以通过针对具有复杂几何形状的各种对象的方法来获得合理的重建。
translated by 谷歌翻译
现在,整个研究社区都可以广泛使用机器学习(ML),它促进了这些新兴的数学技术在广泛学科中的新型和引人注目的应用的扩散。在本文中,我们将重点介绍一个特定的案例研究:古人类学领域,该领域旨在根据生物学和文化证据理解人类的演变。正如我们将表明的那样,ML算法的易用性以及在人类学研究界的适当使用方面缺乏专业知识,导致了整个文献中出现的基本错误应用。结果不可靠的结果不仅破坏了将ML合法纳入人类学研究的努力,而且还会对我们的人类进化和行为过去产生潜在的理解。本文的目的是简要介绍古人类学中ML的某些方式;我们还为那些与该领域完全熟悉的人提供了一些基本ML算法的调查,而该领域仍在积极发展。我们讨论了一系列的错误,错误和违反正确的ML方法方案的行为,这些方法经常在人类学文献的积累体内出现令人不安。这些错误包括使用过时的算法和实践;不适当的火车/测试拆分,样本组成和文本解释;以及由于缺乏数据/代码共享以及随后对独立复制的限制而缺乏透明度。我们断言,扩大样本,共享数据和代码,重新评估同行评审的方法,以及最重要的是,开发包括ML专家在内的跨学科团队对于将ML在人类学中纳入ML的未来研究的进步都是必要的。
translated by 谷歌翻译
本文介绍了基于2022年国际生物识别技术联合会议(IJCB 2022)举行的基于隐私感知合成训练数据(SYN-MAD)的面部变形攻击检测的摘要。该竞赛吸引了来自学术界和行业的12个参与团队,并在11个不同的国家 /地区举行。最后,参与团队提交了七个有效的意见书,并由组织者进行评估。竞争是为了介绍和吸引解决方案的解决方案,这些解决方案涉及检测面部变形攻击的同时,同时出于道德和法律原因保护人们的隐私。为了确保这一点,培训数据仅限于组织者提供的合成数据。提交的解决方案提出了创新,导致在许多实验环境中表现优于所考虑的基线。评估基准现在可在以下网址获得:https://github.com/marcohuber/syn-mad-2022。
translated by 谷歌翻译
变形面的图像对面对识别的安全系统构成了严重威胁,因为它们可用于非法验证具有单个变形图像的多人身份。现代检测算法学会使用真实个体的真实图像来识别这种变形攻击。这种方法提出了各种隐私问题,并限制了公开培训数据的数量。在本文中,我们探讨了仅在不存在的人及其各自的形态上接受训练的检测算法的功效。为此,对两种专用算法进行了合成数据的训练,然后在三个现实世界数据集上进行了评估,即:FRLL-MORPHS,FERET-MORPHS和FRGC-MORPHS。我们的结果表明,合成的面部图像可以成功用于检测算法的训练过程,并将其概括为现实世界情景。
translated by 谷歌翻译
Current state-of-the-art segmentation techniques for ocular images are critically dependent on large-scale annotated datasets, which are labor-intensive to gather and often raise privacy concerns. In this paper, we present a novel framework, called BiOcularGAN, capable of generating synthetic large-scale datasets of photorealistic (visible light and near-infrared) ocular images, together with corresponding segmentation labels to address these issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2 (DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic Mask Generator (SMG) component that produces semantic annotations by exploiting latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through extensive experiments across five diverse ocular datasets and analyze the effects of bimodal data generation on image quality and the produced annotations. Our experimental results show that BiOcularGAN is able to produce high-quality matching bimodal images and annotations (with minimal manual intervention) that can be used to train highly competitive (deep) segmentation models (in a privacy aware-manner) that perform well across multiple real-world datasets. The source code for the BiOcularGAN framework is publicly available at https://github.com/dariant/BiOcularGAN.
translated by 谷歌翻译
传染媒介符号架构将高维传染料空间与一组精心设计的操作员组合起来,以便使用大型数字向量进行符号计算。主要目标是利用他们的代表权力和处理模糊和歧义的能力。在过去几年中,已经提出了几个VSA实现。可用的实现在底层矢量空间和VSA运算符的特定实现中不同。本文概述了十一可用的VSA实现,并讨论了其潜在的矢量空间和运营商的共性和差异。我们创建了一种可用绑定操作的分类,并使用来自类比推理的示例来显示非自逆绑定操作的重要分支。主要贡献是可用实施的实验比较,以便评估(1)捆绑的容量,(2)非精确解除界操作的近似质量,(3)组合绑定和捆绑操作对查询的影响回答性能,(4)两个示例应用程序的性能:视觉地位和语言识别。我们预计此比较和系统化与VSA的开发相关,并支持选择特定任务的适当VSA。实现可用。
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译